↜ Back to index Introduction to Numerical Analysis 2

Part b–Lecture 5

We have implemented the conjugate gradient method in Lecture b2 to find an approximate solution of the system Ax = b. With a good preconditioner like the one in Lecture b3, it is one of the fastest available methods to solve certain large linear systems. It is commonly used in practice to solve large linear systems that come from a discretization of partial differential equations.

But there is one more thing that we must do to speed up the method for typical matrices A. By far the most demanding operation in each iteration of the CG method algorithm is the matrix-vector multiplication Ap. It requires on the order of n^2 operations for a matrix of size n\times n.

The good news is that quite often in practice the matrix A is sparse (疎行列): most of the entries of A are zero.

Example of a sparse matrix from Report 1: most entries are zero (green squares) — Example of a **sparse** matrix from Report 1: most entries are zero (green squares)

These zero entries do not contribute to the value of Ap and we are only wasting time computing products and sums with zero operands: For a computer, the multiplication by zero takes exactly as much time as a multiplication by any other number. The same for addition.

Example 1. Matrix A from Lecture b3 has on average 3 nonzero entries per row or column. The total number of nonzero entries is 3n - 2 for the matrix of size n \times n. The number of operations (multiplications or additions) needed for computing A p is therefore about 3n. This is significantly smaller than the usual n^2 operations required for a general matrix.

Example 2. The usual finite difference scheme for Poisson’s equation -\Delta u = f in d dimensions leads to a linear system with a matrix A with at most 2d + 1 nonzero elements per row. This is a very common linear system to solve in practice.

So if the matrix A is sparse with a only a few nonzero elements per row, we can potentially speed up the program about n times if we only perform the necessary operations!

Not only that, we also do not need to really store the zero elements in memory, and therefore need only on the order of n units of memory instead of n^2. That is n-times less memory! This is huge savings for larger n, and allows us to solve much larger system on a computer.

Let us consider how to do this for the simple (but important) matrix from Lecture b3 with \gamma = 2: A_{ij} := \begin{cases} 2, & i =j,\\ -1, & |i - j| = 1,\\ 0 & \text{otherwise}. \end{cases}

Up to now we were using the function mv_mul that accepts any matrix A. Let us write a function lap_mul¹ that multiplies a given vector x by the specific matrix A above.

We note that the formula for Ax simplifies to the following \begin{aligned} y_1 = (Ax)_1 &= 2x_1 - x_2,\\ y_2 = (Ax)_2 &= -x_1 + 2x_2 - x_3,\\ & \vdots\\ y_i = (Ax)_i &= -x_{i-1} + 2x_i - x_{i+1},\\ & \vdots\\ y_{n-1} = (Ax)_{n-1} &= -x_{n-2} + 2x_{n-1} - x_n,\\ y_n = (Ax)_n &= -x_{n-1} + 2x_n,\\ \end{aligned}

Taking advantage of the known fixed elements of A, we do not need to store A in memory and can implement Ax directly as follows:

implicit none
real, dimension(:), allocatable ::  x, y
integer i, n

read *, n

allocate(x(n))

do i = 1, n
    x(i) = i**2
enddo

! note that y does not need to be allocated in the main program
y = lap_mul(x)

print *, y

contains 
    function lap_mul(x) result(y)
        real x(:), y(size(x))
        integer i, n

        n = size(x)

        y(1) = 2 * x(1) - x(2)

        do i = 2,n - 1
            y(i) = -x(i-1) + 2 * x(i) - x(i + 1)
        end do

        y(n) = -x(n - 1) + 2 * x(n)
    end
end

Exercise 1.

Write two programs in Fortran that perform x \leftarrow Ax for the matrix A above 1000 times, but does not print anything. Use x_i = 0 as the initial value for x. For one program, use mv_mul and for the other program use lap_mul. Compare the run times for n = 500 and n = 1000. You can also try n = 1000\,000 for the program with lap_mul.

Hint. To measure the run time of a program, use the command time ./a.exe or time ./a.out. If your program is reading the value of n from the standard input (keyboard), run is as

time ./a.exe <<< 1000

to not measure the time it takes to enter the value of n (= 1000 in this example).

Exercise 2.

Modify the program for the preconditioned CG from Lecture b3 to use lap_mul instead of mv_mul.

There is no need for gam and a, so remove these variables.

The program should read n and \omega from the input and print the norm of the residual for every CG iteration as in the original program.

Also modify the function precond to not use a but directly use the known elements of A. That is, the replace the function by precond_lap declared as:

function precond_lap(r, om) result(z)

The implementation of this function is as the original precond, with the following modifications:

It does not use array a.
inner(a(i, :i - 1), z(:i - 1)) has value -z(i - 1) if i > 1 and has value 0. if i = 1.
inner(a(i, i + 1:), z(i + 1:)) has value -z(i + 1) if i < n, and value 0. if i = n. Here n is size(r).
a(i, i) is always 2..

Try to run it for n = 1000, 10\,000 and 100\,000 with \omega = 1.9 and \omega = 1.9999. Run with n = 1000\,000 and \omega = 1.999999. Measure the run times and the number of CG iterations to confirm that it is significantly faster than the original implementation.

Submit the code to LMS.

In the coming lectures, we will learn how to handle general sparse matrices.

lap refers to the Laplace operator. This is where the matrix A comes from: the usual finite difference discretization of -\Delta u(x) = -\frac{d^2 u}{d x^2}(x) \approx -u(x - 1) + 2u(x) - u(x + 1) in one dimension with discretization step 1.↩︎